Closed
Bug 157334
Opened 22 years ago
Closed 14 years ago
IE6 3x faster than Mozilla, concatenating two local strings
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: pschwartau, Unassigned)
References
Details
(Keywords: perf, testcase, Whiteboard: [For best timing results, save and run testcases locally])
Attachments
(8 files, 12 obsolete files)
6.88 KB,
text/html
|
Details | |
3.40 KB,
text/plain
|
Details | |
769 bytes,
patch
|
Details | Diff | Splinter Review | |
2.98 KB,
text/plain
|
Details | |
153.93 KB,
patch
|
Details | Diff | Splinter Review | |
126.25 KB,
patch
|
Details | Diff | Splinter Review | |
3.60 KB,
patch
|
Details | Diff | Splinter Review | |
3.42 KB,
patch
|
Details | Diff | Splinter Review |
This bug arose from the JS Engine performance tracking bug 117611. It was noted that Mozilla under-performs on Peter Nilsson's string concatenation test at http://www.formula-one.nu/mozilla/jsTestcases/, which is structured schematically as follows: function test() { var str1= a very long string; var str2= a very long string; var str; for(var i=0; i<=UBOUND; i++) str = str1 + str2; } I will attach this testcase below in an HTML format that can be run in NN4.x, Mozilla, and in IE6. I will also attach a standalone JS version that I run via the load() function in the optimized JS shell. The key thing to note is that all the variables are declared locally in the function. This should avoid some DOM overhead, since the DOM does security checks on all global variable access. Here are results I got by trying the test in all three browsers and in the JS shell. The Mozilla build date I used was 2002-07-01. OS=WinNT4.0(SP6); 500MHz CPU; 128M RAM UBOUND = 10,000 Average time after 10 runs (in ms): IE6 = 314 JS = 715 Moz = 1045 NN4 = 1853 Note: for comparison, here are timing results on an empty loop: UBOUND = 100,000: Average time after 10 runs (in ms): IE6 = 151 JS = 97 Moz = 167 NN4 = 2171 We see from above that IE6 is more than 3x faster than Mozilla on the string concatenation test, and more than 2x faster than the JS shell -
Reporter | ||
Comment 1•22 years ago
|
||
Reporter | ||
Comment 2•22 years ago
|
||
Comment 3•22 years ago
|
||
I'm seeing even worse results. I did 10 iterations and got: IE6 = 40.1 Moz = 490.4 I then ran it out to 70 iterations and got: IE6 = 40.02 Moz = 500.6 Evaluations in IE ranged from 30-50 ms. Evaluations in Moz ranged from 440-620 ms. On Win2k (SP2), 1200 MHz CPU, 512M RAM with Mozilla 0704.
Comment 4•22 years ago
|
||
Moz 2002071208: Average time (in ms) = 1048.3 Nwtscape 4.79: Average time (in ms) = 318.6 IE 6: Average time (in ms) = 58.2
Reporter | ||
Comment 5•22 years ago
|
||
Jerry: can you give us the OS, CPU, and RAM of your PC? I'm surprised that NN4.79 is faster than Mozilla for you on this. What results do you get if you save the testcase locally and run it off your disk? That's the best way (or were you doing that already?). Also, do you have anything set in Mozilla's Edit > Preferences ? Thanks -
Reporter | ||
Updated•22 years ago
|
Whiteboard: [For best timing results, save and run testcases locally]
Comment 6•22 years ago
|
||
Local files: Mozilla: Average time (in ms) = 481.47 NN4: Average time (in ms) = 411.80 IE6: Average time (in ms) = 58.61 Windows XP; Athlon 800, 512MB PC-133 I have some prefs to keep windows from popping up, but that's the only prefs I have set w/regards to JS.
Comment 7•22 years ago
|
||
Phil, Jerry: please try the test in xpcshell and report results. /be
Reporter | ||
Comment 8•22 years ago
|
||
Trying the above JS shell testcase in my xpcshell. OS=WinNT4.0(SP6); 500MHz CPU; 128M RAM; Mozilla 20020701xx JS shell: Average time (in ms) = 710 xpcshell: Average time (in ms) = 660 Mozilla: Average time (in ms) = 1018 Most contributors will not have an optimized JS shell handy, but Mozilla automatically comes with the xpcshell.exe. STEPS TO TEST IN XPCSHELL: 1. Save the JS shell testcase locally as, say, Concat_Two_Strings.js 2. Save it in the same directory that your Mozilla binary is in. 3. Open a console window; for example MS-DOS or CygWin 4. cd to your Mozilla directory 5. At the command prompt, type ./xpcshell.exe -f Concat_Two_Strings.js 6. (RESULTS WILL PRINT OUT) 7. Hit the up arrow and <Enter> to try again, etc. 8. End your xpcshell session by typing the command quit()
Comment 9•22 years ago
|
||
XPCShell: Average time (in ms) = 260.4
Comment 10•22 years ago
|
||
Doing 100 runs: xpcshell = 204.3 Evaluations in xpcshell ranged from 190-211 ms.
Reporter | ||
Comment 11•22 years ago
|
||
> The key thing to note is that all the variables are declared locally
> in the function. This should avoid some DOM overhead, since the DOM
> does security checks on all global variable access.
True, but in my adaptation of Peter Nilsson's test, I mistakenly
used a global variable |UBOUND| in the for-statement of the loop:
function test()
{
var str1= a very long string;
var str2= a very long string;
var str;
for(var i=0; i<=UBOUND; i++)
str = str1 + str2;
}
I will attach corrections below, and leave the two testcases
above in case anyone wants to compare how much DOM overhead
is incurred by a global |UBOUND|.
Reporter | ||
Comment 12•22 years ago
|
||
Reporter | ||
Comment 13•22 years ago
|
||
Attachment #91234 -
Attachment description: HTML testcase → HTML testcase (poorly scoped)
Attachment #91234 -
Attachment is obsolete: true
Attachment #91235 -
Attachment description: JS shell testcase → JS shell testcase (poorly scoped)
Attachment #91235 -
Attachment is obsolete: true
Reporter | ||
Comment 14•22 years ago
|
||
It was noted in bug 117611 comment 63 that Mozilla appears to have slowed down on the browser version of the string concatenation test, when measured from the 1.0 release to today's current trunk. Brendan asked for pure JS engine testing on this, since the browser incorporates other modules which could be causing the slowdown. I did this using the pure JS shell testcase in Comment #13 above. I compared today's current trunk vs. the CVS tag MOZILLA_1_0_RELEASE. As timeless notes in bug 117611 comment 65, to build via Makefile.ref it is necessary to copy over such subdirectories as js/src/editline/ and js/src/config/, since they didn't get the MOZILLA_1_0_RELEASE tag. Here are my results on WinNT4.0(SP6), 500MHz CPU, 128M RAM. All times in ms: MOZILLA_1_0_RELEASE Current trunk (2003-02-17) UBOUND = 10000 690 681 UBOUND = 100000 6917 6837 These show a nominal JS Engine performance improvement on this test. So it looks like any observed slowdown on this is browser-related -
Updated•21 years ago
|
OS: Windows NT → All
Hardware: PC → All
Comment 15•21 years ago
|
||
Do we have any chance of finding out where in the browser this slowdown is based on?
Keywords: testcase
Comment 16•20 years ago
|
||
Athlon XP 1800+ Firefox 0.9: Average time (in ms) = 896.93 Excluding first run = 891.78 IE 6: Average time (in ms) = 31.06 Excluding first run = 29.92 Is there any chance to find a reason?
Comment 17•20 years ago
|
||
Uh, that's way off, and I wonder whether you'd find a similar difference if you got rid fo the string concatenation and just ran a simple loop test. Cc'ing jst, peterv, nallen. Obviously this could be profiled. jprof or another such sampler, or quantify or another such instrumenter. Cc'ing bz in case he's back and interested. /be
Assignee: khanson → general
Component: JavaScript Engine → DOM: Level 0
Comment 18•20 years ago
|
||
There is also big difference between Linux and Windows here. Fx Linux gave results ~140 (4x slower than IE) while Fx Windows gave me results between 600 and 800 (20x slower)
Comment 19•20 years ago
|
||
I tried poking into this by looking at the difference in performance between browser and xpcshell. I had some trouble profiling because the browser seems to have two different performance modes. mode 1 was the baseline (280 ms), mode 2 took ~20% less time (220 ms), xpcshell took ~40% less time (160 ms), and IE took ~80% less time (60 ms). Much of the test case time was spent in GC so I focused on that. When doing concat, there's a memory allocation failure path: js_ConcatStrings:182 js_NewString:2439 js_AllocGCThing:529 js_GC which is the last ditch allocator to try again. xpcshell ran GC through that path quite a bit taking relatively short GCs. mode 2 ran GC through that path an order of magnitude less frequently taking fewer, longer GCs spending a lot of time in js_MarkGCThing. mode 1 almost never ran GC through that path. It was instead taking the JS_MaybeGC on the dom branch callback and running many, long GCs. I can't duplicate the extremely bad performance numbers. This needs to be profiled on one of the machines that sees 25x+ slowdown compared to IE. But I wouldn't be surprised if the % time in GC was huge.
Comment 20•20 years ago
|
||
Oh, these last ditch GCs are coming from http://lxr.mozilla.org/mozilla/search?string=gGCSize. Nallen, can you try setting that constant to 0xffffffff and rerunning? Also, benchmarking the js shell, built from js/src/Makefile.ref on Unixes with a few directories pulled (config, editline) in addition to what Mozilla builds pull, gives the baseline for JS perf. XPCShell has thread-safety enabled, although its costs have been optimized away in most cases. /be
Comment 21•20 years ago
|
||
I see no point, given the DOM's use of JS_GC and JS_MaybeGC, in having any last-ditch GC limit for XPConnect-based apps that we host. Other apps lacking our DOM code, but using SpiderMonkey and XPConnect, may care; they will have to make themselves heard, or suit themselves with better GC-calling from their own code. /be
Comment 23•20 years ago
|
||
I spun a build with unlimited gGCSize. FF dropped down to 170 ms on the test case, competitive with where xpcshell was. xpcshell is toast though. It dropped from 160 ms to 130 ms but it can't keep that up very long since it never runs a GC ever.
Comment 24•20 years ago
|
||
Is there any chance to get Win build with this patch ? I don't have any environment to build it by myself.
Comment 25•20 years ago
|
||
I wouldn't recommend running with this patch. It speeds up the testcase, but bumping the GC limit to a larger yet still reasonable value would as well. When things go wrong with no limiter, they can go really wrong. Multiple lines of defense are a good thing here.
Updated•20 years ago
|
Flags: blocking1.8a4?
Flags: blocking1.7.x?
Updated•20 years ago
|
Flags: blocking1.8a4? → blocking1.8a4-
Comment 26•20 years ago
|
||
Taking -- I have some JS-engine-focused inspiration based on some recent work done for hardware-optimized Java VMs. /be
Assignee: general → brendan
Component: DOM: Level 0 → JavaScript Engine
Priority: -- → P1
Target Milestone: --- → mozilla1.8beta
Comment 27•20 years ago
|
||
"Stack" here refers to a virtual interpreter LIFO storage structure, not even the interpreter's stack, and not a machine (thread or process) stack. /be
Comment 28•20 years ago
|
||
Attachment #159677 -
Attachment is obsolete: true
Comment 29•20 years ago
|
||
Attachment #159693 -
Attachment is obsolete: true
Updated•20 years ago
|
Flags: blocking1.7.x? → blocking1.7.x-
Comment 30•20 years ago
|
||
The bad behavior for synthetic benchmarks that overallocate memory is no reason for blocking aviary1.0. I'll fix this soon enough on the trunk, we'll get it into a shipping release soon enough. /be
Status: NEW → ASSIGNED
Flags: blocking-aviary1.0+ → blocking-aviary1.0-
Updated•20 years ago
|
Target Milestone: mozilla1.8beta1 → mozilla1.9alpha
Comment 31•19 years ago
|
||
Are we there yet? (soon enough, that is) ;-)
Comment 32•19 years ago
|
||
Tested with Firefox 20050621 FF -> Average time (Ubound = 100000): 2650 IE6 -> Average time (Ubound = 100000): 311 So, nope, no improvement in almost 3 years.
Comment 33•19 years ago
|
||
Well... On my Athlon 1.8, Gentoo with Gecko/20050605 Firefox/1.0+ it's
(100 tries, default UBOUND)
Average time (in ms) = 198
Attachment 96729 [details].
Opera 8.01:
Average time (in ms) = 256
Konq 3.4:
Average time (in ms) = 158.62
Comment 34•19 years ago
|
||
One of you commenters want to chip in more than complaints? The numbers for other browsers are welcome, but it would be good to know the machine configuration(s) used for these tests. What's more, try varying UBOUND from small to large and plot the results (attach csv or other data files, something we can chart with spreadsheets). That would be a help. My past comments reflected interest, but I didn't have the time. This bug is now targeted for 1.9alpha, which is right. No more complaining -- supply signal, not noise. /be
Comment 35•19 years ago
|
||
If I were to answer this the way most people are answered by owners when they make a request in Bugzilla, I would say, "Hey, it's open source. If you want the data, you are welcome to get it yourself."
Comment 36•19 years ago
|
||
(In reply to comment #35) > If I were to answer this the way most people are answered by owners when they > make a request in Bugzilla, I would say, "Hey, it's open source. If you want the > data, you are welcome to get it yourself." Of course, if Brendan were to answer in that fashion, he would say: "Hey, it's open source. If you want better performance, you are welcome to fix this bug yourself."
Comment 37•19 years ago
|
||
Jerry, take it somewhere else or I'll stop playing along. The whole point here is cooperation, not whining or grumbling, even (*especially*) if you have a point. No one owes you anything, and you don't owe me more data -- but it would help. Are you here to help? I am, but I'm kinda busy and I bet others could get the data faster/better and maybe even analyze it well. /be
Updated•19 years ago
|
Flags: testcase?
Comment 38•19 years ago
|
||
Checking in regress-157334-01.js; /cvsroot/mozilla/js/tests/js1_5/String/regress-157334-01.js,v <-- regress-157334-01.js initial revision: 1.1
Flags: testcase? → testcase+
Comment 39•18 years ago
|
||
(In reply to comment #38) > Checking in regress-157334-01.js; > /cvsroot/mozilla/js/tests/js1_5/String/regress-157334-01.js,v <-- > regress-157334-01.js > initial revision: 1.1 > Is that trunk, or branch? I know this is a trunk bug, but that was going to be done "soon" in Sept 2004, with "shipping release" soon thereafter. Just wondering which branch this patch landed on in case I missed something. I guess it is too late for 2.0 at this point, but one never knows. Maybe it can get slipped in if there is something else forcing another RC and this can fit in since it was pre-existing open bug and helps performance by 3x. Thanks.
Comment 40•18 years ago
|
||
(In reply to comment #39) That check in is for a test not a fix. This bug is still open as you can see from its status ASSIGNED.
Updated•18 years ago
|
QA Contact: pschwartau → general
Comment 41•17 years ago
|
||
Work in progress, eliminate GCF_LOCK as a stored flag, move GCF_MUTABLE into the JSString length member. /be
Updated•17 years ago
|
Target Milestone: mozilla1.9alpha1 → mozilla1.9 M8
Comment 42•17 years ago
|
||
Attachment #275539 -
Attachment is obsolete: true
Comment 43•17 years ago
|
||
Comment on attachment 275567 [details] [diff] [review] backup, v2 >Index: js/src/jsgc.h > /* GC thing type indexes. */ > #define GCX_OBJECT 0 /* JSObject */ > #define GCX_STRING 1 /* JSString */ > #define GCX_DOUBLE 2 /* jsdouble */ >-#define GCX_MUTABLE_STRING 3 /* JSString that's mutable -- >- single-threaded only! */ >+#define GCX_SCRIPT 3 /* JSScript */ > #define GCX_FUNCTION 4 /* JSFunction */ ... > > /* GC flag definitions, must fit in 8 bits (type index goes in the low bits). */ > #define GCF_TYPEMASK JS_BITMASK(GCX_NTYPES_LOG2) > #define GCF_MARK JS_BIT(GCX_NTYPES_LOG2) > #define GCF_FINAL JS_BIT(GCX_NTYPES_LOG2 + 1) > #define GCF_SYSTEM JS_BIT(GCX_NTYPES_LOG2 + 2) >-#define GCF_LOCKSHIFT (GCX_NTYPES_LOG2 + 3) /* lock bit shift */ >-#define GCF_LOCK JS_BIT(GCF_LOCKSHIFT) /* lock request bit in API */ > >-/* Pseudo-flag that modifies GCX_STRING to make GCX_MUTABLE_STRING. */ >-#define GCF_MUTABLE 2 >+/* Request flag bits passed to js_NewGCThing but not stored in memory. */ I am curious what is wrong with with GCF_LOCK given that the storage to store the flag is available in any case? For strings it saves an entry per hash table and the patch from bug 386265 comment 14 assumes this efficiency.
Comment 44•17 years ago
|
||
I was just backing up patches, also trying to free a flag bit for a purpose I've satisfied otherwise. So, thanks for the comment -- you are right, no need to lose the lock flag, which saves us a double hashtable entry for primitives. /be
Comment 45•17 years ago
|
||
I restored GCF_LOCK (my baby from 1996) and removed all the mutable-string farbling, so this should not conflict with the patch for bug 391290. I'll review that bug's patch today. /be
Attachment #275567 -
Attachment is obsolete: true
Comment 46•17 years ago
|
||
Igor, I'd appreciate your comments and early review. This is working for simple tests, with very promising results (especially in terms of process memory size staying small); I'll pound on it tonight. The escape barrier does not compact the local heap when a function returns, instead it hopes for good LIFO allocation patterns and return values. I'll instrument that, too, and clean up the ad-hoc instrumentation. Also TBD: JS_NewContextLocalHeap -- this patch just bloats every context with a local heap. /be
Attachment #275818 -
Attachment is obsolete: true
Attachment #276584 -
Flags: review?(igor)
Comment 47•17 years ago
|
||
Oops, must remember to upload the right patch. /be
Attachment #276584 -
Attachment is obsolete: true
Attachment #276586 -
Flags: review?(igor)
Attachment #276584 -
Flags: review?(igor)
Comment 48•17 years ago
|
||
In the patch I think a fallible write barrier can be avoided if escaped things are just marked as such. Then, when the local GC detacts that the amount of escaped things reaches a particular threshold, the whole arena is transfered to the global list.
Comment 49•17 years ago
|
||
Transferring the arena on overflow trap to the global heap is a good idea. I did not think of doing that also if there are too many escapees. Will try tomorrow. /be
Attachment #276586 -
Attachment is obsolete: true
Attachment #276606 -
Flags: review?(igor)
Attachment #276586 -
Flags: review?(igor)
Comment 50•17 years ago
|
||
(In reply to comment #49) > Created an attachment (id=276606) [details] > important bugfix, plus tracing optimization > > Transferring the arena on overflow trap to the global heap is a good idea. I > did not think of doing that also if there are too many escapees. Escape from a frame being popped is not the issue, though -- it's the heap trap that occurs when a local thing (string or number) is stored in a (global, of course) object or root. In such a case, we can try to keep the thing local, but we won't be able to reduce lastLimit to at or below its offset from the first page in the arena. This will result in more local GCs and freelist-based local GC-thing allocations. It will increase the cost of local GC relative to allocation. That can hurt, at least for the real-nbody.js benchmark (before I added escape trap handling, i.e. fp->lastLimit and js_EscapeBarrier), a lot (order 10x slowdown, I'll post numbers tomorrow). I'll try it and measure, but I'm worried about pathological performance. /be
Comment 51•17 years ago
|
||
(In reply to comment #50) > Escape from a frame being popped is not the issue, though -- it's the heap trap > that occurs when a local thing (string or number) is stored in a (global, of > course) object or root. In such a case, we can try to keep the thing local, but > we won't be able to reduce lastLimit to at or below its offset from the first > page in the arena. But why use lastLimit and not a flag for local things that became global? If js_GetGCThingFlags is slow for that, then what about using CPU pages for GC arenas so accessing the flag field would be very simple bit manipulation without any ifs. MMgc has page allocation code that can be used for that. With the flag the only consequence of thing's escape would be decreased amount of local storage until local GC decides to transfer the whole arena to the global GC. > This will result in more local GCs and freelist-based local GC-thing > allocations. It will increase the cost of local GC relative to allocation. That > can hurt, at least for the real-nbody.js benchmark (before I added escape trap > handling, i.e. fp->lastLimit and js_EscapeBarrier), a lot (order 10x slowdown, > I'll post numbers tomorrow). > > I'll try it and measure, but I'm worried about pathological performance. > > /be >
Comment 52•17 years ago
|
||
(In reply to comment #51) > (In reply to comment #50) > > Escape from a frame being popped is not the issue, though -- it's the heap trap > > that occurs when a local thing (string or number) is stored in a (global, of > > course) object or root. In such a case, we can try to keep the thing local, but > > we won't be able to reduce lastLimit to at or below its offset from the first > > page in the arena. > > But why use lastLimit and not a flag for local things that became global? That's why I was trying to liberate GCF_LOCK (comment 43 and comment 45). But I would prefer to minimize change so as to get this performance and memory win into 1.9. We can move to mmap based page allocation in a later patch, if there is time. [Since I wrote that, hours ago] Thanks for filing this now as bug 392263. > With the flag the only consequence of thing's escape would be decreased amount > of local storage until local GC decides to transfer the whole arena to the > global GC. What if we don't transfer the whole local heap arena, so some local things are referenced by global things. We will need to unify global and local mark phases or the local GC for this local heap's context will sweep the now-globally-referenced, locally-unreferenced thing. The way local GC works, we don't want global things pointing to local things, or we may end up putting local things on a global or per-thread freelist. The case we hope is much more typical than transferring the whole arena to the global GC is to "pop" LIFO-allocated things and empty the arena, leave it ready for more local, mostly-LIFO allocations. /be
Comment 53•17 years ago
|
||
Question about dependent strings. When a local dependent string becomes global, the patch AFAICS does not make global its base. In this case what prevents the base from being collected via local GC?
Comment 54•17 years ago
|
||
(In reply to comment #53) > Question about dependent strings. > > When a local dependent string becomes global, the patch AFAICS does not make > global its base. In this case what prevents the base from being collected via > local GC? That's a bug, thanks for pointing it out (I had it in the back of my mind, chewing a hole in my brain ;-) -- new patch shortly. /be
Comment 55•17 years ago
|
||
Attachment #276606 -
Attachment is obsolete: true
Attachment #276850 -
Flags: review?(igor)
Attachment #276606 -
Flags: review?(igor)
Comment 56•17 years ago
|
||
(In reply to comment #55) > Created an attachment (id=276850) [details] > dependent string fixes Is the idea here is to allow for dependent strings to exists only in the local heap? But then s += string where s is a global variable would not be able to benefit from the patch unless I miss something.
Comment 57•17 years ago
|
||
The patch currently does not deal with GC_GLOBAL_WRITE that fails. How that would be addressed?
Comment 58•17 years ago
|
||
(In reply to comment #56) > (In reply to comment #55) > > Created an attachment (id=276850) [details] [details] > > dependent string fixes > > Is the idea here is to allow for dependent strings to exists only in the local > heap? But then s += string where s is a global variable would not be able to > benefit from the patch unless I miss something. Any global write of a local value must global-ize the local. The dependent string case could be handled by extending the tracer to keep track of multiple things to move, and that could be worthwhile. But I would like to instrument more and keep the code simple, and see whether this dependent local string written to global case is indeed a hard case. Also, the result of + is an independent string, not a dependent string. The left operand (s in your example) would be a global dependent string whose base is the global independent string produced by JSOP_ADD, even though string (the right hand side) is local. See the JOF_GLOBALWRITE testing in jsinterp.c. With more static analysis, we could do better, but that is for ActionMonkey and for later. (In reply to comment #57) > The patch currently does not deal with GC_GLOBAL_WRITE that fails. How that > would be addressed? About as badly as the js_MakeStringImmutable failing in js_SetSlotThreadSafe, which you filed as bug 363059 :-P. If there's an OOM error under js_LocalGC called from the write barrier, the value written to the global-heap-contained slot will be null. To fix this I would need to make the *OBJ_SET_* macros fallible. It could be done, but I'm trying to get benchmark results ignoring this little issue, so I can compare with an alternative patch along the lines you suggested of allowing local things to become reachable from globals, flagging them so they do not get collected by a local GC, and merging global and local mark phases. /be
Comment 59•17 years ago
|
||
(In reply to comment #58) > (In reply to comment #57) > > The patch currently does not deal with GC_GLOBAL_WRITE that fails. How that > > would be addressed? > > About as badly as the js_MakeStringImmutable failing in js_SetSlotThreadSafe, > which you filed as bug 363059 :-P. I do not like the idea of infecting more code with that bug. > If there's an OOM error under js_LocalGC > called from the write barrier, the value written to the global-heap-contained > slot will be null. To fix this I would need to make the *OBJ_SET_* macros > fallible. An alternative recovery is to make the local arena global on errors. Then js_LocalGC never fails. > > It could be done, but I'm trying to get benchmark results ignoring this little > issue, What if fixing the issue (which I do not see as small one) would affect the benchmarks to alter the conclusion?
Comment 60•17 years ago
|
||
(In reply to comment #59) > I do not like the idea of infecting more code with that bug. Agreed, and so I won't commit this patch or any that has the null-on-OOM-from-localGC bug. My review request was more for early eyes on code, not to get the patch landed. I'm still debugging it, in fact. > > If there's an OOM error under js_LocalGC > > called from the write barrier, the value written to the global-heap-contained > > slot will be null. To fix this I would need to make the *OBJ_SET_* macros > > fallible. > > An alternative recovery is to make the local arena global on errors. Then > js_LocalGC never fails. Not bad for a solution that should not affect benchmarks. > > It could be done, but I'm trying to get benchmark results ignoring this little > > issue, > > What if fixing the issue (which I do not see as small one) would affect the > benchmarks to alter the conclusion? I think you just solved that problem ;-). /be
Comment 61•17 years ago
|
||
This is merged up to top of trunk as of now. /be
Attachment #276850 -
Attachment is obsolete: true
Attachment #276850 -
Flags: review?(igor)
Comment 62•17 years ago
|
||
This patch improves performance and memory use for the string concatenation benchmark in this bug, and the double-allocating benchmark in bug 161110. So does the last "plan A" patch (attachment 277024 [details] [diff] [review]), which uses moving GC lightly to preserve LIFO local heap management as much as possible. But for the real-nbody.js test, which jresig hosts and which originated (AFAIK) in the Debian language shootout, this patch makes things a lot slower. That's because it overcollects compared to the stack-like case in the last patch, which can finalize temporaries without first having to mark, when popping a frame. I'll continue with "plan A", working to remove fallibility in the write barrier. /be
Comment 63•17 years ago
|
||
(In reply to comment #62) > But for the real-nbody.js test, which jresig hosts and which originated (AFAIK) > in the Debian language shootout, this patch makes things a lot slower. That's > because it overcollects compared to the stack-like case in the last patch, > which can finalize temporaries without first having to mark, when popping a > frame. I suspect that what slowed math-intensive benchmarks with the patch is not "ovecollecting" but rather "overscanning". That is the patch makes LocalGCSweep more expensive since now it has scan over escaped things. This means that a copy collector for doubles would benefit SM even more then just local heap since copying GC never scans the garbage which calculations with double values produces a lot. It is too bad that due to API compatibility this is not possible for the whole heap and can only be done for a local copy that embeddings do not see. Still it should be possible to reduce the scanning overhead within the current API by marking doubles in groups. Then the scanning code can put the whole group to the free list with single check instead of scanning all group members. With bug 392263 implemented such grouping should be straightforward to support.
Comment 64•17 years ago
|
||
To Brendan: The comments in the bug mention few times real-nbody.js benchmark. Where can I find it? I would like to try it out with a patch from bug 400902.
Comment 65•17 years ago
|
||
(In reply to comment #64) > To Brendan: > > The comments in the bug mention few times real-nbody.js benchmark. See http://ejohn.org/apps/speed/ under Real-World / N-Body. IIRC this test came from the "great language shootout" benchmark-suite at http://shootout.alioth.debian.org/ /be
Updated•17 years ago
|
Assignee: brendan → igor
Status: ASSIGNED → NEW
Comment 66•17 years ago
|
||
http://ejohn.org/blog/javascript-engine-speeds/ is a link to various JS benchmarks that includes nbody one.
Comment 68•17 years ago
|
||
> Re: comment 10 -- I commented on that blog. The code shown there has a couple > of bugs, and we would not want to round up to powers of two past a certain > point. We often go linear at a threshold: 2, 4, 8, ... 1024, 2048, 3072, 4096, > ... although of course you can measure O(n^2) compounding of the linear cost > function if you try hard enough. re: http://www.yafla.com/dforbes/String_Concatenation_and_Immutable_Strings_Speeding_Spidermonkey#a412 (in case this isn't already discussed herein) Fixing the bugs in his implementation and switching to linear-growth at some small threshold seem reasonably, and maybe this is worth developing just to give it a real try against real-world data. I still think even with small linear growth, at a small threshold, there is a non-trivial footprint cost associated with this idea.
Comment 69•17 years ago
|
||
Did the patch from comment #21 ever get any traction?
Comment 70•17 years ago
|
||
(In reply to comment #68) > > Re: comment 10 -- I commented on that blog. The code shown there has a couple > > of bugs, and we would not want to round up to powers of two past a certain > > point. We often go linear at a threshold: 2, 4, 8, ... 1024, 2048, 3072, 4096, > > ... although of course you can measure O(n^2) compounding of the linear cost > > function if you try hard enough. > > re: > http://www.yafla.com/dforbes/String_Concatenation_and_Immutable_Strings_Speeding_Spidermonkey#a412 > (in case this isn't already discussed herein) > > Fixing the bugs in his implementation and switching to linear-growth at some > small threshold seem reasonably, and maybe this is worth developing just to > give it a real try against real-world data. I still think even with small > linear growth, at a small threshold, there is a non-trivial footprint cost > associated with this idea. > How about you work a patch up - we land it and let it go through talos/couple days of nightlies and back back out?
Comment 71•16 years ago
|
||
Any chances in getting this in or should we move the TM? I can reproduce it in latest Fx3 nightly - average is 67, comparing to 16 in IE& (HTML testcase)
Comment 72•16 years ago
|
||
Assignee: general → crowder
Status: NEW → ASSIGNED
Comment 74•16 years ago
|
||
the way the old else clause was written did not agree with house-style, so I cleaned it up, but the majority of the changes are whitespace.
Comment 75•16 years ago
|
||
The spacing still looks wrong, and not just the spacing -- how about a plain (space-sensitive, I mean, but still cvs diff -pu8) diff? More important, and I've said this before, exponentiation explodes, so we should switch to linear growth at some threshold. Doubling is wrong at large size. /be
Comment 76•16 years ago
|
||
Brendan: You don't like the diff from comment #73, as a diff? This needs more testing and it's buggy, too (makes the assumption that power-of-2 alloc happened on creation of the first string, which is - I think - false), but worse it doesn't seem to yield much of a bonus in perf-tests, if anything. I never sought review on this patch for that reason.
Comment 77•16 years ago
|
||
With jemalloc we definitely don't want to second-guess realloc's underlying size classification. This patch is not the way, so let's put energy into other avenues. /be
Comment 78•16 years ago
|
||
Yep, agreed, which is why I abandoned work on it after trying the easy hack. I think perhaps we should WONTFIX this ancient bug (the summary is sort of bogus anyway).
Comment 79•16 years ago
|
||
We should not WONTFIX this bug without re-running relevant benchmarks from it, noting the performance, and adjusting the summary if the bug should stay open. Just because one possible patch is invalid does not mean the bug should be marked WONTFIX -- that would be a bug that's a real malfunction by some measure, but that we believe does not matter, or is worth the costs for other reasons. /be
Comment 80•16 years ago
|
||
Throwing this one back in the pool.
Assignee: crowder → general
Status: ASSIGNED → NEW
Priority: P1 → --
Target Milestone: mozilla1.9alpha8 → ---
(We get clobbered by IE9 on this: we're bimodal at either 70 or 250ms, and they are sub-1ms. Seems like they might be doing some analysis!)
Comment 82•14 years ago
|
||
A constant propagation that works across branches/joins would pick this up. Bug 608346 does this, but doesn't constant fold string concatenation (easy to fix). How does the comparison look when the strings are not constant?
If I make the function look like var str1='qwertyuiop...'; function test() { var str2='qwertyu...'; var str; for(var i=0; i<=UBOUND; i++) str = str1 + str2; return str; } IE9 looks like 0-2 ms. If I make it return str[40], it goes to about 2-4ms. If I make the inner loop be str = str1[i % str1.length] + str2; in addition to the other changes, IE9 is about 5-9ms, and we cut down to 30 -- because we're copying half as much string data, if I have to guess. We may have to accept that they just have fast strings. :-)
Comment 84•14 years ago
|
||
Our problem here is that even though we make a rope when doing str1 + str2, we also have to allocate a ballast to hold the flattened string on every such concatenation, which is 4KB for these strings. If I comment out the code allocating this ballast my time on a microbenchmark adding these qwerty strings goes from 40ms to .5ms. Blocking on bug 608776, which should fix this.
Depends on: 608776
Comment 85•14 years ago
|
||
This has been resolved via bug 609440, which made JS_GetStringChars() fallible, removing the requirement of maintaining a buffer alongside the rope. The testcase from Comment 13 currently reports 3.5ms average time on tracemonkey linux-x64-opt. Closing.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 86•14 years ago
|
||
Hi, I'm on Windows XP. Setting UBound = 100000, running 30 times Using latest nightly (Gecko/20110113 Firefox/4.0b10pre) I get Average time (in ms) = 6.36 Using IE8 I get Average time (in ms) = 92.26 And using Chrome 8.0.552.237 I get Average time (in ms) = 2.0 So, 3x slower than Chrome, something to investigate? Sorry to post on a closed bug.
Comment 87•14 years ago
|
||
3x slower than chrome we should investigate in a new bug...
Comment 88•14 years ago
|
||
reported as bug 625615
You need to log in
before you can comment on or make changes to this bug.
Description
•